consensus optimization
Non-Ergodic Convergence Algorithms for Distributed Consensus and Coupling-Constrained Optimization
Abstract--We study distributed convex optimization with two ubiquitous forms of coupling: consensus constraints and gl obal affine equalities. Without smooth ness or strong convexity, we establish non-ergodic sublinear ra tes of order O (1/ k) for both the objective optimality and the consensus violation. Leveraging duality, we then show that the eco nomic dispatch problem admits a dual consensus formulation, and t hat applying the same algorithm to the dual economic dispatch yi elds non-ergodic O (1/ k) decay for the error of the summation of the cost over the network and the equality-constraint resid ual under convexity and Slater's condition. Numerical results on the IEEE 118-bus system demonstrate faster reduction of bot h objective error and feasibility error relative to the state -of-the-art baselines, while the dual variables reach network-wide con sensus. This paper studies large-scale convex optimization proble ms formulated over networks, which frequently arise in engineering applications.
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Reviews: The Numerics of GANs
This paper presents a novel analysis of the typical optimization algorithm used in GANs (simultaneous gradient ascent) and identifies problematic failures when the Jacobian has large imaginary components or zero real components. Motivated by these failures, they present a novel consensus optimization algorithm for training GANs. The consensus optimization is validated on a toy MoG dataset as well as CIFAR-10 and CelebA in terms of sample quality and inception score. I found this paper enjoyable to read and the results compelling. My primary concern is the lack of hyperparameter search when comparing optimization algorithms and lack of evidence that the problems identified with simultaneous gradient ascent are truly problems in practice.
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, Ji Liu
Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart? Although decentralized PSGD (D-PSGD) algorithms have been studied by the control community, existing analysis and theory do not show any advantage over centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario where only the decentralized network is available. In this paper, we study a D-PSGD algorithm and provide the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. This is because D-PSGD has comparable total computational complexities to C-PSGD but requires much less communication cost on the busiest node.
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
A Zeroth-Order Proximal Algorithm for Consensus Optimization
Wang, Chengan, Ou, Zichong, Lu, Jie
This paper considers a consensus optimization problem, where all the nodes in a network, with access to the zeroth-order information of its local objective function only, attempt to cooperatively achieve a common minimizer of the sum of their local objectives. To address this problem, we develop ZoPro, a zeroth-order proximal algorithm, which incorporates a zeroth-order oracle for approximating Hessian and gradient into a recently proposed, high-performance distributed second-order proximal algorithm. We show that the proposed ZoPro algorithm, equipped with a dynamic stepsize, converges linearly to a neighborhood of the optimum in expectation, provided that each local objective function is strongly convex and smooth. Extensive simulations demonstrate that ZoPro converges faster than several state-of-the-art distributed zeroth-order algorithms and outperforms a few distributed second-order algorithms in terms of running time for reaching given accuracy.
Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization
Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. However, finding most-probable explanations (MPEs) in these models can be computationally expensive. In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. We derive algorithms based on a consensus-optimization framework and demonstrate their superior performance over state of the art. We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints.
Exact Subspace Diffusion for Decentralized Multitask Learning
Wadehra, Shreya, Nassif, Roula, Vlaski, Stefan
Classical paradigms for distributed learning, such as federated or decentralized gradient descent, employ consensus mechanisms to enforce homogeneity among agents. While these strategies have proven effective in i.i.d. scenarios, they can result in significant performance degradation when agents follow heterogeneous objectives or data. Distributed strategies for multitask learning, on the other hand, induce relationships between agents in a more nuanced manner, and encourage collaboration without enforcing consensus. We develop a generalization of the exact diffusion algorithm for subspace constrained multitask learning over networks, and derive an accurate expression for its mean-squared deviation when utilizing noisy gradient approximations. We verify numerically the accuracy of the predicted performance expressions, as well as the improved performance of the proposed approach over alternatives based on approximate projections.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.82)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)
Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization
Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. However, finding most-probable explanations (MPEs) in these models can be computationally expensive. In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. We derive algorithms based on a consensus-optimization framework and demonstrate their superior performance over state of the art. We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints.
Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization
Bach, Stephen, Broecheler, Matthias, Getoor, Lise, O', leary, Dianne
Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. However, finding most-probable explanations (MPEs) in these models can be computationally expensive. In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. We derive algorithms based on a consensus-optimization framework and demonstrate their superior performance over state of the art. We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints. Papers published at the Neural Information Processing Systems Conference.
Training GANs with Centripetal Acceleration
Peng, Wei, Dai, Yuhong, Zhang, Hui, Cheng, Lizhi
Training generative adversarial networks (GANs) often suffers from cyclic behaviors of iterates. Based on a simple intuition that the direction of centripetal acceleration of an object moving in uniform circular motion is toward the center of the circle, we present the Simultaneous Centripetal Acceleration (SCA) method and the Alternating Centripetal Acceleration (ACA) method to alleviate the cyclic behaviors. Under suitable conditions, gradient descent methods with either SCA or ACA are shown to be linearly convergent for bilinear games. Numerical experiments are conducted by applying ACA to existing gradient-based algorithms in a GAN setup scenario, which demonstrate the superiority of ACA.